All-relevant feature selection using multidimensional filters with exhaustive search

نویسندگان

  • Krzysztof Mnich
  • Witold R. Rudnicki
چکیده

This paper describes a method for identification of the informative variables in the information system with discrete decision variables. It is targeted specifically towards discovery of the variables that are non-informative when considered alone, but are informative when the synergistic interactions between multiple variables are considered. To this end, the mutual entropy of all possible k-tuples of variables with decision variable is computed. Then, for each variable the maximal information gain due to interactions with other variables is obtained. For non-informative variables this quantity conforms to the well known statistical distributions. This allows for discerning truly informative variables from non-informative ones. For demonstration of the approach, the method is applied to several synthetic datasets that involve complex multidimensional interactions between variables. It is capable of identifying most important informative variables, even in the case when the dimensionality of the analysis is smaller than the true dimensionality of the problem. What is more, the high sensitivity of the algorithm allows for detection of the influence of nuisance variables on the response variable.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature selection using singular value decomposition and QR factorization with column pivoting for text-independent speaker identification

Selection of features is one of the important tasks in the application like Speaker Identification (SI) and other pattern recognition problems. When multiple features are extracted from the same frame of speech, it is expected that a feature vector would contain redundant features. Redundant features confuse the speaker model in multidimensional space resulting in degraded performance by the sy...

متن کامل

A Monotonic Measure for Optimal Feature Selection

Feature selection is a problem of choosing a subset of relevant features. Researchers have been searching for optimal feature selection methods. `Branch and Bound' and Focus are two representatives. In general, only exhaustive search can bring about the optimal subset. However, under certain conditions, exhaustive search can be avoided without sacri cing the subset's optimality. One such condit...

متن کامل

Combining Multiple Feature Selection Methods

This paper proposes a feature selection method that combines various feature selection techniques. Feature selection has been realized as one of the most important processes in various applications, especially pattern classification problems. When too many attributes are involved, training a machine to classify patterns into their respective classes is seemingly impossible. Hence, selecting goo...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1705.05756  شماره 

صفحات  -

تاریخ انتشار 2017